Run-Time Optimization of Sparse Matrix-Vector Multiplication on SIMD Machines

نویسندگان

  • Louis H. Ziantz
  • Can C. Özturan
  • Boleslaw K. Szymanski
چکیده

Sparse matrix-vector multiplication forms the heart of iterative linear solvers used widely in scientific computations (e.g., finite element methods). In such solvers, the matrix-vector product is computed repeatedly, often thousands of times, with updated values of the vector until convergence is achieved. In an SIMD architecture, each processor has to fetch the updated off-processor vector elements while computing its share of the product. In this paper, we report on run-time optimization of array distribution and offprocessor data fetching to reduce both the communication and computation time. The optimization is applied to a sparse matrix stored in a compressed sparse row-wise format. Actual runs on test matrices produced up to a 35 percent relative improvement over a block distribution with a naive multiplication algorithm while simulations over a wider range of processors indicate that up to a 60 percent improvement may be possible in some cases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Run - Time Optimization of Sparse Matrix - Vector Multiplication onSIMD

Sparse matrix-vector multiplication forms the heart of iterative linear solvers used widely in scientiic computations (e.g., nite element methods). In such solvers, the matrix-vector product is computed repeatedly, often thousands of times, with updated values of the vector until convergence is achieved. In an SIMD architecture, each processor has to fetch the updated oo-processor vector elemen...

متن کامل

Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors

The low utilization of SIMD units and memory bandwidth is the main performance bottleneck on SIMD processors for sparse matrix-vector multiplication (SpMV), which is one of the most important kernels in many scientific and engineering applications. This paper proposes a hybrid optimization method to break the performance bottleneck of SpMV on SIMD processors. The method includes a new sparse ma...

متن کامل

Optimization by Run-time Specialization for Sparse Matrix-Vector Multiplication (Submitted for publication)

Run-time specialization is the process of generating programs based on information available only at run time. This technique has the potential of generating highly efficient codes, at the expense of the overheads of the run-time code generation. It is applicable when some input data is used repeatedly while other input data varies. In this paper we explore the potential for obtaining speedups ...

متن کامل

Optimization of Sparse Matrix-Vector Multiplication by Specialization

Program specialization is the process of generating optimized programs based on available inputs. It is particularly applicable when some input data are used repeatedly while other input data vary. Specialization can be employed at compile-time as well as at run-time, depending on when the inputs become available. In this paper we explore the potential for obtaining speed-ups for sparse matrix-...

متن کامل

When to Cache Block Sparse Matrix Multiplication: A Statistical Learning Approach

In previous work it was found that cache blocking of sparse matrix vector multiplication yielded significant performance improvements (upto 700% on some matrix and platform combinations) however deciding when to apply the optimization is a non-trivial problem. This paper applies four different statistical learning techniques to explore this classification problem. The statistical techniques use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994